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Abstract 

This paper studies the problem of support recovery of sparse signals based on multiple measurement 
vectors (MMV). The MMV support recovery problem is connected to the problem of decoding messages 
in a Single-Input Multiple-Output (SIMO) multiple access channel (MAC), thereby enabling an infor- 
mation theoretic framework for analyzing performance limits in recovering the support of sparse signals. 
Sharp sufficient and necessary conditions for successful support recovery are derived in terms of the 
number of measurements per measurement vector, the number of nonzero rows, the measurement noise 
level, and especially the number of measurement vectors. Through the interpretations of the results, in 
particular the connection to the multiple output communication system, the benefit of having MMV for 
sparse signal recovery is illustrated providing a theoretical foundation to the performance improvement 
enabled by MMV as observed in many existing simulation results. In particular, it is shown that the 
structure (rank) of the matrix formed by the nonzero entries plays an important role on the performance 
limits of support recovery. 

I. Introduction 

Suppose the signal of interest is X G M™^', and X is said to be sparse when only a few of its rows 
contain nonzero elements whereas the rest consist of zero elements. One wishes to estimate X via the 
linear measurements Y = AX + Z, where A G ]R"x™ is the measurement matrix and Z G M"^^ is the 
measurement noise. The goal is to estimate X from as few measurements as possible. Specifically, when 
1 = 1, this problem is usually termed as sparse signal recovery with a single measurement vector (SMV); 
when Z > 1, it is referred to as sparse signal recovery with multiple measurement vectors (MMV) 111. |[2l. 
This problem has received much attention in many disciplines motivated by a broad array of applications 

The material in this paper was presented in part at the Asilomar Conference on Signals, Systems, and Computers, Pacific 
Grove, Cafifomia, USA, 2010. 
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such as compressed sensing 131, HI, biomagnetic inverse problems ||5l, ||6l, image processing Q, HI, 
robust face recognition |I9|, bandlimited extrapolation and spectral estimation lITOl . robust regression and 
outlier detection ifTTI . speech processing |[T2l . channel estimation |[T3l . lfT4l . echo cancellation |[T5l . |[T6l . 
body area networks ifTTl . and wireless communication |[T3l . ifTSll . 

A. Background on the SMV Problem 

For the problem of sparse signal recovery with SMV, computationally efficient algorithms have been 
proposed to find or approximate the sparse solution X G in various settings. A partial list includes 
matching pursuit |[l9l, orthogonal matching pursuit (OMP) ||20l. Lasso HH, basis pursuit ||22l, FOCUSS 
(H, iteratively reweighted ii minimization ll23l . iteratively reweighted I2 minimization |[24l . sparse 
Bayesian learning (SBL) |[25l . ||26l . finite rate of innovation |[27l . CoSaMP 1281 , and subspace pursuit 
||29l . Analysis has been developed to shed light on the performances of these practical algorithms. For 
example, Donoho |i3J, Donoho, Elad, and Temlyakov |[30l . Candes and Tao ||3TI . and Candes, Romberg, 
and Tao |[32l presented sufficient conditions for £i-norm minimization algorithms, including basis pursuit 
and its variant in the noisy setting, to successfully recover the sparse signals with respect to different 
performance metrics. Tropp ll33l . Tropp and Gilbert ll34l . and Donoho, Tsaig, Drori, and Starck ll35l 
studied the performances of greedy sequential selection methods such as matching pursuit and its variants. 
Wainwright |[36l and Zhao and Yu ll37l provided sufficient and necessary conditions for Lasso to recover 
the support of the sparse signal, i.e., the set of indices of the nonzero entries. On the other hand, from 
an information theoretic perspective, a series of papers, for instance, Wainwright |[38l . Fletcher, Rangan, 
and Goyal |[39l . Wang, Wainwright, and Ramchandran [40], Ak9akaya and Tarokh ll4ll . Jin, Kim, and 
Rao P2I . provided sufficient and necessary conditions to characterize the performance Umits of optimal 
algorithms for support recovery, regardless of computational complexity. 

B. Background on the MMV Problem 

As a fast emerging trend, the capability of collecting multiple measurements with an array of sensors 
in an increasing number of applications, such as magnetoencephalography (MEG) and electroencephalog- 
raphy (EEG) llll, H3l . blind source separation P4l . multivariate regression P31 . and source localization 
B6l . gives rise to the problem of sparse signal recovery with multiple measurement vectors. Practical 
algorithms have been developed to address the new challenges in this scenario. One class of algorithms 
for solving the MMV problem can be viewed as straightforward extensions based on their counterparts in 
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the SMV problem. To sample a few, M-OMP US], gTl, M-FOCUSS 113, lxli2 minimization methoj] 
ESI, multivariate group Lasso fl31 . and M-SBL B9l can be all viewed as examples of this kind. Another 
class of algorithms additionally make explicit effort to exploit the structure underlying the sparse signal 
X, such as the temporal correlation or the autoregressive nature across the columns of X which would 
be otherwise unavailable when / = 1, to aim for better performance of sparse signal recovery. For 
instance, the improved M-FOCUSS algorithms IH and the auto-regressive sparse Bayesian learning (AR- 
SBL) ll50l both have the capability of explicitly taking advantage of the structural properties of X to 
improve the recovery performance. Along side the algorithmic advancement, a series of work have been 
focusing on the theoretical analysis to support the effectiveness of existing algorithms for the MMV 
problem. We briefly divide these results into two categories. The first category of theoretic analysis 
aims at specific practical algorithms for sparse signal recovery with MMV. For example, Chen and Huo 
lISTl discovered the sufficient conditions for Ixjip norm minimization method and orthogonal matching 
pursuit to exactly recover every sparse signal within certain sparsity level in the noiseless setting. Eldar 
and Rauhut ||52ll also analyzed the performance of sparse recovery using the ^1/^2 norm minimization 
method in the noiseless setting, but the sparse signal was assumed to be randomly distributed according 
to certain probability distribution and the performance was averaged over all possible realizations of 
the sparse signal. Obozinski, Wainwright, and Jordan 1451 provided sufficient and necessary conditions 
for multivariate group Lasso to successfully recover the support of the sparse signaj^ in the presence 
of measurement noise. The second category of theoretic analysis are of an information theoretic nature, 
and explore the performance limits that any algorithm, regardless of computational complexity could 
possibly achieve. In this regard. Tang and Nehorai E3] employed a hypothesis testing framework with the 
likelihood ratio test as the optimal decision rule to study how fast the error probability decays. Sufficient 
and necessary conditions are further identified in order to guarantee successful support recovery in the 
asymptotic sense. 

C. Focus and Contributions of This Paper 

We develop sharp asymptotic performance limits among the signal dimension m, the number of nonzero 
rows k, the number of measurements per measurement vector n, and the number of measurement vectors 

'This method is sometimes referred to as £2/^1 minimization, due to the naming convention in a specific paper. In this paper, 
we use to indicate a cost of a matrix B which is define as J]], KX^j D^'^'^l- 

^We refer to the support of a matrix X as the set of indices corresponding to the nonzero rows of X. It will be formally 
defined in Section HH 
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/ for reliable support recovery in the noisy setting. We show that n = {logm)/c{X) is sufficient and 
necessary. We give a complete characterization of c{X) that depends on the elements of the nonzero rows 
of X. Together with interpretations, we demonstrate the potential performance improvement enabled by 
having MMV, and hence bolster its usage in practical applications. Our main results are inspired by the 
analogy to communication over a Single-Input Multiple-Output (SIMO) multiple access channel (MAC). 
According to this connection, the columns of the measurement matrix form a common codebook for 
all senders. Codewords from the senders are individually multiplied by unknown channel gains, which 
correspond to nonzero entries of X. Then, the noise corrupted linear combinations of these codewords are 
observed by multiple receivers, which correspond to the multiple measurement vectors. The problem can 
be viewed as k single-antenna users communicating over a non-frequency selective channel with a base 
station equipped with I receive antennas. Thus, the problem of support recovery can be interpreted as 
multiple receivers jointly decoding messages sent by multiple senders, i.e., a SIMO MAC channel. With 
appropriate modifications, the techniques for deriving multiple-user channel capacity can be leveraged to 
provide performance limits for support recovery. 

In the Uteratures on sparse signal recovery with SMV, the analogy between the problems of sparse 
signal recovery and channel coding has been observed from various perspectives in previous work |541, 
||55l Section IV-D], gOl Section II-A], [41] Section III-A], 051 Section 11.2]. However, their extensions 
to the MMV problem are unavailable to the authors' knowledge. Moreover, our approach differs from 
existing works and would be different form their possible extensions to the MMV scenario, if any. We 
customize tools from multiple-user information theory to address the support recovery problem and we 
obtain sharp performance Umits in the form of tight sufficient and necessary conditions. 

D. Organization of the Paper 

In Section JIl we formally define the problem of support recovery of sparse signals in the presence of 
MMV. To motivate the main results of the paper and their proof techniques, in Section JII] we discuss the 
similarities and differences between the support recovery problem and the multiple access communication 
problem. The main results of the paper are presented in Section JVl along with the interpretations. The 
proofs of the main theorems are presented in Appendices |A] and |B] Relations to existing work are 
discussed in Section jV] Section |Vl] concludes the paper with further discussions. 
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E. Notations 

Throughout this paper, a set is a collection of unique objects. Let M™ denote the m-dimensional 
real Euclidean space. Let 1 denote a column vector whose elements are all I's, and its length can be 
determined in the context. Let N = {1, 2, 3, ...} denote the set of natural numbers. Let [k] denote the set 
{1, 2, k}. The notation |5| denotes the cardinality of set S, ||x||2 denotes the ^2-norm of a vector x, 
and II^IIf denotes the Frobenius norm of a matrix A. For a matrix A, Aj denotes its ith column, A^ 
denotes its zth row, and Aj- denotes the submatrix formed by the rows of A indexed by the set T. 

IL Problem Formulation 

Let W G R'^^', where Wij / for all Let S = [^i, 5^]^ e [mf be such that Si, 5^ 
are chosen uniformly at random from [m] without replacement. In particular, {5i,...,5fc} is uniformly 
distributed over all size-/c subsets of [m]. Then, the signal of interest X = X{W, S) is generated as 

[ ifs^ {Si...,St}. 
The support of X, denoted by supp(X), is the set of indices corresponding to the nonzero rows of X, 
i.e., supp(X) = {Si, According to the signal model ([T]), |supp(X)| = k. Throughout this paper, 
we assume k is known. 

We measure X through the linear operation 

Y = AX + Z (2) 

where A G J^^x*^ is the measurement matrix, Z € M"^' is the measurement noise, and Y G M"^' is the 
noisy measurement. We assume that the elements of A are independent and identically distributed (i.i.d.) 
according to the Gaussian distribution A/^(0, fi^), and the noise Zi j are i.i.d. according to A/^(0, cx^). We 
assume o"^ and al are known. 

Upon observing the noisy measurement Y, the goal is to recover the indices of the nonzero rows of 
X. A support recovery map is defined as 

d : M"x' ^ 2["^l (3) 

Given the signal model ([T]), the measurement model (HJ), and the support recovery map we define 
the average probabiUty of error by 

PKy)/supp(x(ty,s))} 
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for each (unknown) signal value matrix W G M.^^K Note that the probability is averaged over the 
randomness of the locations of the nonzero rows S, the measurement matrix A, and the measurement 
noise Z. 

III. Interpretation of Support Recovery via Multiple-User Communication 

We introduce an important interpretation of the problem of support recovery of sparse signals by relating 
it to a single-input multiple-output (SIMO) multiple access channel (MAC) communication problem. This 
relationship motivates the intuition behind our main results and facilities the development of the proof 
techniques. It can be also viewed as an MMV extension of our earlier work |[56l . in which a similar 
connection was employed to interpret the support recovery problem with SMV. 

A. Brief Review on SIMO MAC 

Consider the following wireless communication scenario. Suppose k senders wish to transmit informa- 
tion to a set of / common receivers. Each sender i has access to a codebook 'rf'^^^ = {c^^^ , c^^(i) }, 
where c^*^ G M" is a codeword and mS^^ is the number of codewords in the codebook. The rate for sender 
i is ii^'^ = (log m^^^)/n. To transmit information, each sender chooses a codeword from its codebook, and 
all senders transmit their codewords simultaneously to / receivers leading to the SIMO MAC problem: 

Yj,i = hj,iXi^i + /ij,2^2,i H h hj^kXk,i + Zj^i, i = 1, 2, n, and j = 1, 2, / (4) 

where Xg^i denotes the input symbol from sender q to the channel at the ith use of the channel, hj^g 
denotes the channel gain between sender q and receiver j, Zj^i is the additive Gaussian noise i.i.d. 
according to J\f{0,a^), and 1^ j is the channel output at receiver j at the ith use of the channel. 

After receiving Y^i_,...,l^„ at each receiver j G [/], the receivers work jointly to determine the 
codewords transmitted by each sender. Since the senders interfere with each other, there is an inherent 
tradeoff among their operating rates. The notion of capacity region is introduced to capture this tradeoff 
by characterizing all possible rate tuples {R^^\ R^'^\ R^''^) at which reliable communication can be 
achieved with diminishing error probability of decoding. By assuming each sender obeys the power 
constraint ||c^*^|p/n < cr^ for all j G [m^^^] and all i G [k], the capacity region of a SIMO MAC with 
known channel gains |[57l is 




where hi = [hi^i, for i G [k]. 
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B. Similarities and Differences to the Problem of Support Recovery 

Based on the measurement model Q, we can remove the columns in A which correspond to the zero 
rows of X, and obtain the following effective form of the measurement procedure 

Y, = Xs,,jAs, + ■■■ + Xs^jAs, + Z, (6) 

for j € [/]. By contrasting ^ to the SIMO MAC dH), we can draw the following key connections that 
relate the two problems ll58l . 

i) A nonzero entry as a sender: We can view the existence of a nonzero row index 5*^ as sender i 
that accesses the channel. Since there are k nonzero entries, this results in k users leading to the 
MAC analogy. 

ii) A measurement vector as a receiver: We can view the existence of a measurement vector Yj as 
a measurement at receiver j. The multiple receivers leads to the multiple output (MO) part of the 
analogy. 

iii) Xsij as the channel gain: The nonzero entry Xs^j, i.e., Wij, plays the role of the channel gain 
hj^i from the ith sender to the jth receiver. 

iv) Aj as the codeword: We treat the measurement matrix ^ as a codebook with each column Aj, 
i E [m], as a codeword. Each element of As^ is fed one by one through the channel as input 
symbols for the ith sender to the / receivers, resulting in n uses of the channel. Since a users 
transmits a single stream, this leads to the single input (SI) part of the analogy. 

v) Similarity of objectives: In the problem of sparse signal recovery, we focus on finding the support 

Sfc} of the signal. In the problem of MAC communication, the receiver needs to determine 
the indices of codewords, i.e.. Si, ...,Sk, that are transmitted by the senders. 
Based on the abovementioned aspects, the two problems share significant similarities which enable 
leveraging the information theoretic methods for the SIMO MAC problem for the performance analysis 
of support recovery of sparse signals. However, there are domain specific differences between the support 
recovery problem and the channel coding problem that should be addressed accordingly to rigorously 
apply the information theoretic approaches ll56ll . 

1) Common codebook: In MAC communication, each sender uses its own codebook. However, in 
sparse signal recovery, the "codebook" A is shared by all "senders". All senders choose their 
codewords from the same codebook and hence operate at the same rate. Different senders will not 
choose the same codeword, or they will collapse into one sender. 
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2) Unknown channel gains: In MAC communication, the capacity region (|5]l is valid assuming that 
the receiver knows the channel gain hi ll59l . In contrast, for sparse signal recovery problem, X5. 
is actually unknown and needs to be estimated. Although coding techniques and capacity results 
are available for communication with channel uncertainty, a closer examination indicates that those 
results are not directly applicable to our problem. For instance, channel training with pilot symbols 
is a common practice to combat channel uncertainty |i60|. However, it is not obvious how to 
incorporate the training procedure into the measurement model Q, and hence the related results 
are not directly applicable. 
Once these differences are properly accounted for, the connection between the problems of sparse 
signal recovery and channel coding makes available a variety of information theoretic tools for handling 
performance issues pertaining to the support recovery problem. Based on techniques that are rooted in 
channel capacity results, but suitably modified to deal with the differences, we present the main results 
of this paper in the next section. 



A. Main Results 

We consider the support recovery of a sequence of sparse signals generated with the same signal value 
matrix W. In particular, we assume that k and / are fixed. Define the auxiliary quantity 



The following two theorems summarize the main results. The proofs are presented in Appendices A and 
B. 

Theorem 1: If 



IV. Main Results and Their Interpretations 




(7) 



limsup < c{W) 



(8) 



then there exists a sequence of support recovery maps d'^'"-* : K' 



2M, such that 



lim P{d{Y) supp(X(VF,S))} = 0. 



(9) 



Theorem 2: If 



log 71 

lim sup 



> c{W) 



(10) 



then for any sequence of support recovery maps {S-'^^}'. 



00 

m=k ' 



n,nXl I y 2 [™] 



liminf P{d(y) / supp(X(W,S))} > 0. 



(11) 
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Theorems 1 and 2 together indicate that n = e(M/)=fc£ ^'-'S sufficient and necessary number of 

measurements per measurement vector to ensure asymptotically successful support recovery. The constant 
c{W) explicitly captures the role of the nonzero entries in the performance tradeoff. 



B. Interpretations of the Main Results 

We further explore the implications of having multiple measurement vectors. Due to the comphcated 
nature of the expression for c{W), we will employ different approximations to make the interpretations 
more accessible. 

1) The Low-Noise-Level Scenario: We consider the case where is sufficiently small. Let X-r,i, T C 
[k], denote the ith largest eigenvalue of Wj-W-r- For a SIMO MAC problem, the sum capacity grows 
as min(/c, I) leading to significant gains in the task of support recovery. This is captured in the following 
corollary. 

Corollary 1: For a given W, suppose Y&ak{W]j-W_j-) = min(|T|,/) for all T C [k]. For sufficiently 
small cr^ > 0, there exists a constant a G (0, 1) such that if 



logm 



lim 

m->oo rirj 



min(A;, I) , 

< a ■ • log ^ 

2k ai 



(12) 



then there exists a sequence of support recovery maps {d'^"^Hm=fe' "^^"^^ ' l^""*^' i-> 

lim P{d{Y) ^ supp(X(Ty, S))} = 0. 

m— >-oo 

Proof: Note that for T C [k] with \T\ < I, \T,i > for ? = 1, 2, \T\. Thus 

|7-| 

_^,„gdet (/ + |E£fKr) = ^ logn (l + |a,,) 

lT|.log<+|:iog(| + A.,.) 



2H, such that 



2\T\ 
1 



log 4 



1 1 



1 ^ log(gf + Ar .: 



l + O 



■ log 0-2 



>2log^-ar 
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for some a-j- G (0, 1). For any possible T C [k] with [T| > /, \t/l > for i = 1, 2, Then, we have 
similarly 



> 



2|r| 

2m 



log 4 



1 + 



1 

log al 



log' 



erf 



Thus, if A; < / 



and if /c > / 



mm 

rc[fc] 



07-. 



1 



> - log -t; • min ar 

-2 rc[fc] 



(13) 



mm 

rc[fc] 



> — log — ;7 • mm a-T- 

-2k ^ al rc[fc] 



(14) 



^logdet(/ + |lZ^lEr)_ 
Combining (fTSl ) and (fT4l) and applying Theorem [T] complete the proof. ■ 

Corollary [T] indicates the following observations. First, as the measurement noise level al approaches 
zero, the term ™™2k '^^ ^ exerts a major influence on the sufficient condition (fT2l ). The nonzero signal 
matrix W plays its role mainly through the ranks of its row-wise submatrices, which are ensured to be 
full rank according to the technical assumption that xank{WjpW_-j-) = min(|T|,/) for any T C [k]. 

Second, by rearranging the terms in (fT2l) . we obtain 



m 



a-mm{k,l)- 



which corresponds to the maximum number of columns of A that still yields a diminishing error 
probability in support recovery. Specifically, the term min(A;, /) reveals the following insight. In the 
scenario with sufficiently small cr^, for the challenging problem where the number of measurement 
vectors is less than the number of nonzero rows, i.e., / < k, adding one more measurement vector can 
lead to a much larger upper bound on the manageable number of columns of A. On the other hand, 
when k < I, the problem is much simpler and adding more measurement vectors may not significantly 
increase the manageable size of A. From an algorithmic point of view, subspace based methods can be 
used to recover the support in the latter case. 

2 ) The Role of the Nonzero Signal Matrix: Next, we take a closer look at on the role of the nonzero 
signal matrix W in support recovery with MMV. We consider two different cases. In the first case, W 
consists of identical columns. The following corollary states the corresponding sufficient condition for 
support recovery. 
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Corollary 2: Suppose W G M'^^' has identical columns, i.e., W = [w, w], for some w G with 
all entries being nonzero. If 

lim 01111^-^106(1 + ^.41^1^) (15) 

m^oo Urn TC[k] 2\T\ V ^2 / 

then there exists a sequence of support recovery maps {d'^™^}m=A,; • M"™^' i-^ 2[™1, such that 

lim P{d{Y) ^ supp(X(iy,S))} = 0. 
Proof: Note that, for any T C [k], 

logdet (^I + ^KrWr^ = logdet (^I+^Jwr, wr]T[wr, wr]^ 



2 



logdet (l + ^||wr||il- IT 



= log h +/.^i|wri 

Applying Theorem 1 completes the proof. ■ 
Based on (fTSl) . the effect of having / identical nonzero signal vectors is equivalent to decreasing 
the noise level by a factor of /, compared to the problem with SMV. This is in accordance with the 
intuition that when the underlying signals remain the same, taking more measurement vectors provides 
an opportunity to average down the measurement noise level. We hasten to add that identical columns 
are unlikely in practice. Even small changes in the coefficients can lead to a full rank matrix, leading to 
significant benefits in the high signal-to-noise ratio (SNR) case. 

In the second case, we construct a special example to achieve a large performance improvement via a 
second measurement. This is demonstrated in the following corollary. 

Corollary 3: Suppose W = [wi, W2] G M'^^^, where k is even, wi = 1 G M'^, and W2 is defined as 



If 



^ I if 1< i < |, 
-1 if I < f < yfc. 



(16) 



lim '^<hoJl + k4) (17) 
then there exists a sequence of support recovery maps {d^"^^}'^^f,,S'^^ : M"™^' i-^' 2^"^\ such that 

lim P{d{Y) / supp(X(iy,S))} = 0. 
Proof: Please see Appendix O ■ 
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For the ease of illustration, we compare the performances among the problems with (i) SMV where 
W = 1G M^^\ (ii) MMV where W = [1, 1] G R^'''^, and (iii) MMV where W is defined in Corollary 
3, for an even ka The following table summarizes the results. 





lower bound on n 


upper bound on m 


(i) SMV (W = 1) 








(ii) MMV (W = [1, 1], Corollary 2) 


^> 'r"" , ^^ 






(iii) MMV (W as defined in Corollary 3) 


n> — 2^ 







Based on this table, we have the following observations for this specific setup. First, compared with 
the SMV problem, having MMV can improve the performance of support recovery by enabling a relaxed 
condition on the number of measurements n. Equivalently, for the same number of measurements per 
measurement vector, the MMV setup permits a measurement matrix A with more columns. Second, 
the performance improvement enabled by having MMV is closely related to c{W), and it can be quite 
different for different nonzero signal value matrices. In case (ii), we achieve a moderate performance 
gain which is equivalent to reducing the noise level by half. On the contrary, in case (iii), a larger 
performance gain can be achieved due to the structure of the nonzero signal value matrix. Note that the 
change occurs in the factor in the exponent in the upper bound for m. In summary, these examples are 
specially constructed as representative cases to illustrate the effect of the nonzero signal value matrix W 
in support recovery. Generally, the difficulty of a support recovery problem is inherently determined by 
the model parameters and Theorems 1 and 2 together characterize their exact roles. 

3) A Generalization of W: Thus far, we have assumed Wij ^ for all i,j in the discussion above. 
Now, we generalize W in the following manner: for each i £ [k], there exist a j G [I] such that Wij ^ 0; 
meanwhile, for each j G [/], there exist a i € [k] such that Wij ^ 0. This relaxed assumption indicates 
that neither a zero row nor a zero column exists but zero elements are allowed in W, as opposed to the 

original assumption that all elements of W are nonzeros. Accordingly, 

I 

supp(X) = IJ supp(Xj) 

i=i 

^Note that ||wi||2 ~ ||w2||2, which can be viewed as a way of normalization to make comparison meaningful. 
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which means the support of X is equivalent to the union of the supports of all columns of X. Following 
the proofs for Theorem 1 and 2, one can readily see that the two theorems still hold in this case. 

It is worthwhile to note that having more measurement vectors does not necessarily result in per- 
formance improvement. To illustrate this point, we construct a simple example. Let W^^^ = [0.1,5]^, 

r 0.1 



, and % = 10. As a result, c{W^'^^) = c{W^'^'') = ^ log 1.1. This means that the 

5 6 

performance limits for these two setups are the same. Intuitively, by inspecting the definition of c{W), it 
can be seen that if a submatrix composed of certain rows of W is ill-conditioned, the minimization inside 
c{W) may likely be determined by that submatrix. Hence, for an extra measurement vector to benefit 
support recovery, this measurement vector should correspond to a column of W whose presence improves 
the small eigenvalues of the previous worst-case submatrix that causes the performance bottleneck. The 
observations are reminiscent of some of the intuition developed in space time wireless communication 
systems ||6T1l . The / receivers can be viewed an a I antenna receiver and it is known that the rank of the 
channel matrix plays an important role in the high SNR case. The correlation between the channel gains 
is not as harmful in this context. The gains of having multiple receive antennas is lower at low SNR 

V. Relation to Existing Results 
We discuss the relation between the main results in this paper and existing results in the literature. 

A. Relation to the Performance of Practical Algorithms 

Our analysis provides the performance limit that governs all possible support recovery algorithms. 
This is achieved by a theoretic support recovery method which has exponential complexity and therefore 
is impractical. However, it is interesting to make comparisons with performance limits of practical 
algorithms, since it provides insight into the potential gap between the performance of a practical algorithm 
and the fundamental performance limit, and suggests possibilities for performance improvement. 

We note that the model employed in Obozinski, Wainwright, and Jordan 1451 is similar to the measure- 
ment model Sufficient and necessary conditions are derived therein for multivariate group Lasso to 
successfully recover the support of the sparse signal in the presence of noise, as m, n, and k jointly grow 
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to infinity in certain mannero This is different from our assumption that k is fixed. Although a direct 
comparison may seem difficult, we wish to draw the following intuitive discussion. Note that Example 1 
in ||45l Section 2.3] considered the case for identical regression, which means the nonzero signal matrix 
W has identical columns. The conclusion therein is that multivariate group Lasso offers no performance 
improvement under the MMV formulation compared with using Lasso on an SMV formulation with one 
measurement vector. However, our Corollary 2 indicates that the effect of having / identical columns 
in W is equivalent to lowering the noise level by a factor of I. The different performances indicated 
by multivariate group Lasso and the information theoretic analysis lead to the following observation. In 
general, if the sparse signal to be recovered possesses strong structural property, an algorithm needs to 
take advantage of this factor in order to achieve better performance. For multivariate group Lasso, the 
£i/ip cost term completely ignores the row-wise structure presented in the nonzero entries. In contrast, 
AR-SBL [50 1 is developed based on the assumption that the elements of W are drawn from an auto- 
regressive process, and it explicitly attempts to learn this correlation structure. Based on the experimental 
study presented in [50], notable performance improvement in support recovery was observed when such 
correlation is present, including the case when the columns of W were highly correlated. 



B. Relation to Information Theoretic Performance Analysis 

Under the assumption that cr^ = 1 and the elements of W are i.i.d. according to A/'(0, l)!^] Tang and 
Nehorai ll53l identifies sufficient and necessary conditions, involving the model parameters (i.e., m, n, k, I, 
and (t1), to ensure diminishing error probability in support recovery as the problem size grows to infinity. 
We restate the sufficient condition to facilitate the discussion. 

Theorem 3 ( / [53] Theorem 5]): Suppose that n = i}{klog^) and | log ^ » \og{k{'m — k)), then 
with probability one the error probability vanishes. In particular, if n = r2(fclog^) and / ^ log'fog'rra ' 
the error probability vanishes as m — > oo. 

As noted in [531, heuristically, when Z = 1, n S> m, is needed to guarantee asymptotically successful 
support recovery. Although our main results aim for the case with fixed W, intuitive observations can 

''Note that it is stated at the end of Section 3.3 of L45J that the requirement on k growing to infinity can be removed. The 
remarlc therein provided an alternative probability upper bound for the intermediate term Ti such that this bound can drop to 
zero even for a fixed k. However, it seems that the other intermediate term T2 still relies on a probability upper bound that 
involves a term scaling as exp(— |), which requires an increasing k to drive it to zero. 

^We only consider the real case in this discussion. 
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Still be drawn to provide more insight into the behavior of the support recovery with random W. To 
see this, recall that for a sequence of support recovery problems with a fixed W, the quantity c{W) 
inherently determines the performance limit and the sufficient condition is n > -^^pj log m. Now, let us 
assume that the elements of W are i.i.d. according to certain distribution with bounded support. Thus, in 
general, for any constant 6 > 0, the probability P{c{W) < 6) may be strictly positive. This implies that 
for the scaling n = 0(logm), the error probability will not converge to zero because there is a nontrivial 
probability of poor realizations of W such that the sufficient condition above cannot be satisfied. As one 
plausible solution, we need n to grow with m, at a much faster rate to ensure that the sufficient condition 
above can be met with probability converging to one. 

VI. Conclusion and Discussion 

We have developed performance limits for support recovery of sparse signals when multiple measure- 
ment vectors are available. Sufficient and necessary conditions are obtained for support recovery to be 
asymptotically successful. Especially, the role of nonzero entries in the performance limits is explicitly 
characterized, and the quantity c{W) captures the effect of all nonzero entries. The key technique that 
enabled our analysis is motivated by the connection between sparse signal recovery with MMV and 
multiple access communication over SIMO channels. This leads to the opportunity of leveraging the 
methodology for deriving SIMO MAC capacity to help understand the performance limits of sparse 
signal recovery with MMV. Interpretations of the main results were provided in order to demonstrate the 
performance improvement by having MMV, and relations to existing results were also discussed. 

The proposed methodology also has the potential to address other theoretical and practical issues 
associated with sparse signal recovery. First, this analytical approach can be extended to deal with the 
case where the signal value matrix W is random. Outage analysis for fading channels can be leveraged 
to reveal the performance limits for sparse signal recovery in this case. Second, one can consider the 
problem where recovering a partial support is also desirable, if recovering the full support is not possible 
||62l . This can be achieved by treating a subset of users as noise and examining the capacity region of the 
remaining users. The connection between sparse signal recovery and multiple access communication offers 
the opportunity to explore the connection between sparse recovery algorithms and multiuser detection 
techniques with potential for cross-fertilization. A sender with larger channel gain may be easier to detect 
compared to a sender with weaker channel gain. The successive interference cancellation (SIC) scheme 
is aimed to detected users in a sequential manner, where the remaining undetected users are treated as 
noise bearing a strong resemblance to the matching pursuit algorithms for sparse signal recovery. It is 
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conceivable that by appropriately utilizing the techniques for channel coding, peri'ormance Umits could 
be obtained for partial support recovery of sparse signals. 

Further, according to the interpretations of the main results, we can see that the structure of W plays 
an important role in the performance limits. Roughly speaking, high correlation among the columns of W 
may decrease the performance limit for support recovery, in the sense that, given other parameters fixed, 
the dimension of the signal m should be reduced to guarantee successful support recovery. However, as 
observed in practice, when only a finite number of measurements per measurement vector are available, 
a strong correlation among columns of W actually faciUtates the estimation of the nonzero signal values, 
and hence can be beneficial to the performance. Hence, there is an interplay that is not revealed by 
the asymptotic analysis. It will be interesting to study an analytical approach which Unks the estimation 
quaUty of nonzero values in the finite case and performance Umits of support recovery in the asymptotic 
case. 
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Appendix A 
Proof of Theorem 1 

For the ease of exposition, we consider two distinct cases on the number of nonzero rows of X. 
Case 1: A; = 1. In this case, the signal of interest is X = X{W^S\), where W = wi,;]. Fix 

e > 0. We first form an estimate pi of \w\^i\ for i G [/] as 



1 — l|Y,||?-a2| 

I II,, I II '11^ 2 1 



Declare that si G [m] is the estimated index of the nonzero row, i.e., cM{Y) = {si}, if it is the 
unique index such that 

1 iiy - A,, [{-irh, {-irml < + (i9) 

for q'j = 1 or g'j = 2, z G [Z] . If there is none or more than one such index, pick an arbitrary index. 
We analyze the average probabiUty of error 

P{S) = P{d("^)(y) ^ supp(X(W, Si))}. (20) 

Due to the symmetry in the problem and the measurement matrix generation, we assume without loss of 
generality = 1, that is, 

Y = A{W + Z (21) 

for some W = ...,wi^i] G M}^K In the following analysis, we drop superscripts and subscripts on 
m for notational simplicity when no ambiguity arises. Define the events 

£s = |vi G [I], 3qi G {1, 2}, such that ^ \\Y - A,[(-l)?^pi, (-l)*p/]||^ < + e^a^j , s G [m]. 
Then, 

?{£) < P (£1 U iuZ2£s)) (22) 
where £'^ denotes the compliment event of £. Let 

faux = |det (^^ {AlW + Zy {AiW + Z)^ - det {a^W^W + a^l) G (-e,e)| 

n (f]{pi - kMl e (-e,e)}j . 
Then, by the union of events bound and the fact that A'^ U B = A'^ U {B n A), 

m 

P{£) < P{£^,,) + P(ff) + 5^ P{£s n <Saux). (23) 

s=2 
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We bound each term in (|23] ). First, by the weak law of large numbers (LLN), limm-!>oo P('S'aux) = 0- 
Next, we consider P{£i). It can be readily seen that, with qi = {3 + sign(?i;i,j))/2. 



lim p(l\\Y-A^[{-irp,,...,{-ir'pMl<al + e'aA=l. 



(24) 



Hence, lim.m^oo Pi^f) = 0. 

Next, we consider the third term in (|23] ). We need the following lemma, whose proof is presented at 
the end of this appendix. 

Lemma 1: Let B € M"""' be a fixed matrix satisfying il{\=i[^B'^ B]i,i)T = a > 0. Let 5 C [/] be 
a fixed set. Let D G R"^' be a matrix such that, for j G S, T)j ~ J\f{0,9jl) with some 6j > 0; for 
j G Dj = 0. All columns of D are independent. Then, for any 7 G (0, a). 



PI;J^I|B-«I&<7|<2- 



(25) 



We continue the proof of Theorem 1. Consider P{£s H <?aux) for s 7^ 1. Note that 

P{£s n ^aux) < P(^sl^aux) = / P{£s\{Y = Yl} H £,u.) f {Yi\£,,,)dYi. 

Let [(— l)*?!/?!, {—l)'^'pi] = UQV^ denote the singular value decomposition. Since is independent 
of Y and pi for s 7^ 1, it follows from Lemma [T] that (by treating B = YV and D = AsUQ), for 
Qi G {l,2},i G [/] and sufficiently small e, 
1 



nl 



Y-As[{-irWu...,{-iy'Wi] 



nl 



< + e cr„ 



{Y = Yi}n£,, 



{Y = Yi}n£,, 



< 2 

< 2" 



(ci + e^o-^ )' 



(26) 



< 2 ("i+'^-'a)' 



< 2 



where (1261 ) follows from the Hadamard's inequality [63|. Thus, 



flog 



and hence 



p(^,|{y = yi}n^aux)<2'-2 ^ °v <'^+'^'^)' 

' det(CT^ WT vi/^-ct2/^_ 



P(f, n faux) < 2' • m • 2" 



(„2+,2„2)l 



s=2 
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which tends to zero as m — )■ cx), if 



log m 1 / det ialWW + all) - e 
nm sup < - log ' 



(27) 



Since e > is chosen arbitrarily, we have the desired proof of Theorem 1 . 



Case 1: k>2. In this case, the signal of interest is X = XiyV, S). Fix e > 0. First, for i G [Z], we 
form an estimate of llwilb as 



Pi 



ll 


|Y 


||2 
Il2 







(28) 



For r, C > 0, let Q = Q{r, () be a minimal set of points in R*^ satisfying the following properties: 

i) Q C Bk{r), where Bk{r) is the A;-dimensional hypersphere of radius r. 

ii) For any b G Bk{r), there exists w G Q such that ||w — b||2 < |. 



The following properties can be easily proved: 

Lemma 2: 1) For z G [Z], Iim^_^oo P (3W G Q(Wi, C) such that ||W - Wi||2 < c) 
2) 9(^)0 — |2('")C)I is monotonically non-decreasing in r for fixed C,. 



1. 



For i G [Z], given pj and e, fix Qj = Qi{pi,e). Declare d{Y) = {si, ...,Sk} C [m] is the recovered set 
of indices of nonzero rows of W, if it is the unique set of indices such that 

2 



1 

nl 



Wi,..,Wi 



^ ^2 , ^2^2 



(29) 



for some Wj G Qu i E [Z]. If there is none or more than one such set, pick an arbitrary set of k indices. 
Next, we analyze the average probability of error 



P{£) = P{diY)j^X{W,S)}. 
Without loss of generahty, we assume that Sj = j for j = 1,2, ...,k, which gives 

Y=[Ai,...,Ak]W + Z 

for some W. Define the event 



(30) 



(31) 



3Wi G Qi and {s[,...,s'^} = {si,...,Sk} s.t. 



nl 



Y - \A,i 



A./ 



Wi 



^2,22 



Define and cr^„ to be the largest and smallest eigenvalues of the matrix 

^ [A,,...,Ak,^zy[Ai,...,Ak,^Z] 



na. 



CTz 
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respectively. Then 



P(£) = P «2 



u 



E. 



Si,S2,...,Sk 



,si<---<Sk:{si,...,Sk}f^[k] 



ySi<---<Sk:{si,...,Sk}^[k] 
si<---<Sk:{si,...,Sk}^lk] 



(32) 



where 



^aux = (aLx G (1 - e, 1 + e)} n {ctL e (1 - e, 1 + e)} n " H^^H^ ^ (-e, e)} j • 

First, note that limm-^oo Pl-^aux) = 1 due to LLN and the properties of the extreme eigenvalues of 
random matrices |[64l . Next, consider 



1 

nl 



y-[Ai,...,Afc] 



Wi,...,W; 



1 

nl 
1 

nl 



[Ai,...,Afc]M^ + Z-[Ai,...,Afe] 



Wi,...,W, 



[Ai,...,A,,^Z] 

CTz 



W 



1, VV; 

2 



^ ^ 2 2 
< yCTmaxC^a 



W 



w 



Wi,...,Wi 

2 



Wi,...,Wi 



Wi,..,Wz 

By using the fact that a^^^ — > 1 almost surely as n — )• oo ll64l and Lemma 2-1), we have limm-!>oo 
0. 

Next, we consider P{Ssi,s2,...,Sk ^i^aux) for {si, S2, Sk} 7^ [k]. Note that 

^{£si,S2,...,Sk ^ "^aux) 
— ^{£si,S2,...,Sk\£aux) 

Pi£s^,s2,...,Sk\{M =ai}n---n{Ak = ak}n{z = Zo}n£^ 



(33) 

m,2 k) 



{ai,...,afc,Zo}G£';,u 



/(ai,...,afc,Zo|<Saux)dai • • -dakdZo- 



(34) 



For notational simplicity, define ^ = + e^cr^, T = {si, S2, S/t} H [A;], = {si, S2, s^lVT, 
and Scond — {Ai = ai} n • • • n {A^. = a^} n {Z = Zq} n faux- For any permutation {s[, s'2, s';,) of 
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{si, S2, Sk} and any Wj e Qi, i e [I] 



1 

nl 



Wi,...,W, 



cond 



1 

nl 

Define the matrix W' G 



[Ai,...,A,]Ty + Z-[A,.,...,A,J 



Wi,...,Wz 



cond 



as 



if j€[k]\T 



W,-Wi if j = s',GT 



where W, denotes the ith row of the matrix 



Wi,...,W, 



. Define W G 



okxl 



as 



if s', G r 



where is a zero row vector of a proper size. Then, continue from ( [35] ). we have 




|2 

If 



[Ai, Afc, — Z] 



[Ai,...,Afc,— Z] 



W 

w 



AW[ 



^ "^cond 


1 








2 
















F 


2 




\ 




^cond 






/ 



cond 









2 






W 








[Ai,...,Afc,— 




V -Aue 












F 





(35) 



(36) 



(37) 



(38) 



(39) 



where in ( [38] ) W{ denotes matrix formed by removing the zero rows in W', and A denotes the matrix 
by removing columns of [A^'^ , A^J indexed by the indices of the zero rows of W'. To reach ( [39] ). 
let W{ = UQV^ denote the singular value decomposition. The follow lemma, the proof of which is 
presented at the end of this appendix, is useful. 

Lemma 3: Let B G W^'^, D G W^"^ . Let al denote the smallest eigenvalue of B'^B. Then 

det{{BDyBD) > {alydet{D^D). 
W 



Let M ^ [Ai,...,Afc,f-Z] 



V. Conditioned on fcond and the chosen Qi for i G [Z], M is 
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W 



fixed. According to Lemma [3] (treating B = -^[Ai, A^, ^Z] and D 
det ^^MTM^ > ((1 - e)aiy det 



Continue with (l39l l. Using Lemma [2] (treating B = M and D = AC/ 0), we have 

2 



M7' 


T 




0"a 







Wi,..,Wi 



^ iS'cond 



-flog 



dot (^mTm) 



< 2 ^ (-1+^2^2)^ 





/ 


w 


T 


W 


\ 
















I 








/ 











< 2 



^2 I ,2„2,i 



((l-(!)cr2 )' det 



< 2 



■ log- 



[k]\T 



w 



[k]\T 



^2 , ,2„2,i 



■flog- 



where (|4T| | uses the fact that 





T 








T 


W.[k\\T 


+ 




T 




















o 




o 



(40) 



(41) 
(42) 



where O denotes the matrix with elements all being zeros, and the fact that ll65l Corollary 8.4.15], for 
positive semidefinite B, D G M'^^ det{B + D) > det{B). By the union of events bound, 

,S2,...,Sk l^cond) 

< J] P ( Vz,3Wi G Qi such that ^ ^ - [A,-, A,;^ 



{si,...,s'J = {si,...,Sfc} 







2 



















1 



{s'i,...,s;,}={si,...,Sfc} WieSi w,eQi 



^ [ As'j , • • • , A^'^ ] 



Wi,...,Wi 



cond 



^2 I ,2„2,i 
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Furthermore, conditioned on £'aux, Pi < ||wj||2 + e for i € [/] and hence \Qi\ < ^j(||wj||2 + e, e) by Lemma 
H2). Thus, 

P(f,,,,„...,,,n^aux) < A:! - fn%(|K||2 + e,e)J -2"^'°^ ^^f^' ^. (43) 

Note that the probabiUty upper-bound (1431 ) depends on si,...,Sk only through T. Grouping the (^J-^i) 
events {£s^,s2,...,Sk ^ >?aux} with the same T, 

P{£) 

\i=i / rc[fc] 



^2 I ,2„2,1 



i=l J TC[k] 

which tends to zero as m — > oo, if 

1 {{l-e)aiydet(wJrWr + $I 



lim sup — — < ^ log ^-2^.2^2V — (44) 



log m 

for all T CI [k]. Since e > is arbitrarily chosen, the proof of Theorem 1 is complete. 



Next, we prove Lemma 1. For j € S, {bij — Dij)"^ /9j is a noncentral random variable. Its moment 
generating function is |66] (for t < 1/2) 



e 1-2* 



Ele^iK^-D.^^r/e.] = (45) 

(1 - 2t)2 



By changing variable -4- — )• t, we have 



-Lb2 . 

'26 ri t 



E[e— ^] = TT^MTT- (46) 



7l/ 

For j € with Dj = 0, we additionally define 6j = 0. In this case. 
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Define 



Then, we have 



i=i 



E[e*^"] = E[e^ll^-^ll-] (49) 
= E[eT5:J-il|b.-D^lli] (50) 

= nE[e^ll''^-°^ll^] (51) 



The Chernoff bound indicates that 



P(5„ < 7) < mine"^E[e 

s>0 



-sS„-\ 



n 



-^""3 "2 

'm.s 



mme , , „ 

j=i y'- ^ nl 



mine 

p<0 



exp < 



-P7 



Sir 



28,- P 



nl ) 



mm < 

p<0 



exp < mm 
* p<0 



exp < 



mm < 

p<0 



< exp < 



exp < 



exp < 



mm < 

p<0 



i=i 



^llh-l|2 
1 _ ^ 



nil J 11^ 



2 V n 



, lib, Hi "1 2^.pA 



>0 



/ - 



mm ^ -Zp7 - / I J] 



-||b,-||^ 

nil i 112 



loe 



1 



min < — /p7 — I- 
p<0 ^ 



i=i 

nl- 1 — iib 




2 \ ' 

'3 112 



nl , 
r - — log 



nU^iib.iii 



nl 



log 



u=i 



29 jp' 



n 



n 



=f{p) 



I 
) 



= exp |min/(p)| . 

where (|59l ) follows from the fact that the arithmetic mean is no smaller than the geometric mean 
other hand, define the function 

(nU^ll^illi)' nl f 2ep 
g{p, 6) = -Ip-i + Ip '2 ~ 
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Overall, 

mm max q[p,u) = mm 1 , log — (74) 

P<o\e>o^^^ ') \ 2 \ aJ 2 ^ J ^ ^ 

Using the fact that < 1 — ^ < log x for x > 1, we finally have 

mill f max <^(p, 6*) ) = ^log— . (75) 

P<o Y e>o J 27 

Therefore, 



P(5„<7)<exp min/(p)^ (76) 

< mill I max q(p, 6) ] (77) 
~ p<o \ e>o ^ ' ) 

= 2 2'°^ (78) 

[^'.^. [i rT Rl , 

■log— 2^ 



= 2 ^ (79) 

The remaining task is to prove Lemma [3] Let a^^ > ••• > cr^^ be the q eigenvalues of B'^B, 
where = o"^. The eigen-decomposition states that there exists a unitary matrix J G W^^'^, such that 
B'^B = JGGJ'^, where G G R'?^'? is a diagonal matrix with the ith diagonal element being fib j. Thus, 
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D^B^BD = D'^JGGJ'^D = FT, where F = D'^JG and T = F'^. Note that 

det{{BDyBD) 
= det(FT) 



l<ji<-<jV<g 



/iji • • • /: 



fr,ji ■ ■ ■ /) 



det 



E 

l<ji<-<><g 



= E 

i<ji<-<jV<g 



det 



det < 



fr,ji ' ' ' fr,jr 



diag(c7bj,,...,f7fe,^ 



> {-IT E 

i<ii<-<><g 



det 



V 



= {alY det{DUUD) 
= {alY detiD-' D) 

where dSOl ) is due to the Binet-Cauchy formula 167 



(80) 



Appendix B 
Proof of Theorem 2 

To estabUsh this theorem, we prove the following equivalent statement: 

If there exist a sequence of matrices e ^n^xm^ ^ sequence of support recovery 

maps : M"- ^ 2ii'2'-:'"}, such that 



and 



lim P{d^"'\A^"'^X + Z) / supp(X(Ty, S))} = 
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then 

logm 

limsup < c{W). 

For any T C [k], denote the tuple of random variables (5/ : / E T) by S{T). For notation simplicity, 
let P^"^ = P{(i(™)(^^™''^ + ^) / supp(X(Ty, S))}. From Fano's inequality US, we have 

H{S{T)\Y)<H{Si,...,Sk\Y) 

<logkl + H{{Si,...,Sk}\Y) 

<logA;! + PelogQ +1. (81) 

On the other hand, 

/m-i 

ij(5(r)|5(r^)) = log Him-ik- in) - g) 

V 9=0 

= \T\logm — nei^n (82) 



where = [A:]\T and 



/ m-i \ 

ei,„ ^ - log I ml'^l/ n {m-{k- \T\) - q) \ 



which tends to zero as n — )■ co. Hence, combining (1811 ) and (182] ). we have 

in log m = H{S{r)\S{T')) + nei,„ 

= I{S{Ty,Y\SiT'')) + HiSiT)\Y,S{T'')) + nei^n 

< liSiT); l'|5(n)) + H{S{T)\Y) + nei,n (83) 

< I{S{r);Y\SiT'))+logk\ + P^r^log (^^ + 1 + n6i,„ 

= E ^(^*' '5(T)|i:[.-i], 5(r^)) + log k\ + ) log (T) + 1 + n6i,„ 

= E (^(Xji:[i-i],5(n) - /i(Y,|i:[,_i],5([fc]))) +logA;! + Pf hog + l + n6i,„ 
j=l ^ ^ 

< E (^(Yj5(r^)) - /i(Y,|5i, 5fc)) + log kl + ^ log f'^) + 1 + nei,n (84) 

= E (^(Yj5(r'^)) - h{Z,)) + log fc! + Pf ^ log ( J + 1 + n6i,n (85) 
i=l ^ 
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where Y_[i^i] denotes the set {Y^^, Yj_]^}. To explain some intermediate steps, (l83l) follows from the 
fact that conditioning reduces entropy, ([84l ) holds because Yj is independent of when conditioned 

on 5 ([A;]), and (1851 ) follows since the measurement matrix is fixed and Zj is independent of {Si, . . . , S^)- 
Consider 

h{Y,\S{T'')) 

S{T') 



< - log (2^e)' • det {E[{A,^siT)Wr + ZLi)HAMr)Wr + ZJ] - E[A,^sir)Wr + W ^\J^^.s{r)W.r + 



< J log ((2^e)' • det (e[AJ^,^.A^s(T)] " ^[A,sir)V E[^.,5(r)]) Kr + all 



(86) 

where (l86l ) follows from the fact that with the same covariance the Gaussian random vector maximizes 
the entropy 1631 . and the randomness in 5(7-) is due to the randomness of the index set S{T). Note 
that 



^ m 



p=i 



Meanwhile 



(88) 



(89) 



Thus 



^ m ^ m m 1/'" 

- Y] al I H — V V ai,pai,g(l • l^" - /) V 

p=l ^ ' p=l q=l \p=i 



ai,p I 1-1''' 



p=l ^ ' \p=l 



aj,p I 1 • I'T 



^ 1 • IT -/) - ^1 . IT ) . (90) 



m(m — 1) 
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Note that — r^— n-fl • l""" — /) ^ol • l""" = -^7^ — ttI • 1^ / ^ .. I is negative semidefinite for 

m(m—l) \ ' m^{m—l) m(m— 1) o 

sufficiently large m, and so is W]- ( ^2(^_i) l • 1"^ - rnim^i) ^) ^T- Hence 



<det (1 Va? 



IIL — 1 

as a result of |i65i Corollary 8.4.15]. Therefore 

|T| log m 

n 



1=1 



i log f (2vre)' ' det f 1 a^^VF^ (^/ - ^(1 • ^ - /)) P^r + j j " ^ ((2vrea2)'^ 



+ log k\ + P^^^ log ( 7 ) + 1 + nei,n. 



i=l \ ^ p=l ^ 'J 



+ logk\ + Pl Mog +l + nei,„ 



< -lo| 

- 2 * 



\ ^ i=l p=l ^ ' , 



+ logA:! + pfhog(7) + l + nei,„ 



< ^ log det [l - • 1' - ^)) lEr + ^) + log ^! + log + ^ + '''I'" 

< ^logdet(^^W::5-(^/-— + +logA;! + P^™^A:logm + l + nei,„. (91) 



Then, we have 



(l-A;PfV|T|)logm logA;! + n„ei,„ + l 
hm sup -— 

< limsup -i- log det [^W]^ (l ^(1 • 1^ - I)) Wr + I 

2\T\ \ai ' \ m - 1 J ' 

= ^ log det (^^K^Wr + (92) 
for all T ^ [k]. Since liiiim^oo P^J^'^ = 0> we reach the conclusion 

loff 771 1 / (T^ \ 

lim sup < — — log det -^WJ^Wr + / (93) 

for all T ^ [A;]. This completes the proof of Theorem 2. 
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Appendix C 
Proof of Corollary [3] 

To justify this corollary, we need to show 



mm 

rc[fc] 



1 



2k 



2 • — log 1 + A: • 



erf 



To begin with, recall that k is even, and W2 is defined in ( fT6l ). For a given T C [A;] , let 7i = T n [|], 
Ti = T\7i, t = |T|, ti = |7i|, and t2 = [Til- One can obtain 

t ti-t2 
tl-t2 t 



■T 



Let a = ^ for notational simplicity. Thus 
1 



1 



2|^| logdet (/ + aWr^Wr) = - logdet 



1 + at a{ti — t2) 
a{ti — t2) 1 + at 



= ^log(l + 2at + 4a2tit2) 

where we use the fact that t = ti+ t2. Note that, for a given t G [A;], 

min — log (l + 2at + 4a^tit2) = — logfl + 2at) 
T:TC[k\,\T\=t<^ 2t ' 2t 



and 



mm 

T:TQ[k],\T\=t>- 



where we use the implicit constraints that ti,t2 < |- Then, the problem becomes evaluating 



min /(t), where fit) 

t:te[k] 



^log(l + 2at) ifO<t<|, 
^log(l + 2at + 4a2|(t-|)) if | + l < t < /j. 

First, it can be readily seen that min^.^grfci f{t) = i log(l + ak). Next, we consider the function 

9it) 



^ log(/3i + (32t) 



2t 



where /3i = 1 - a^fe^ and f32 = 2a(l + ak) for t G [f , A;]. Note tha 







dgjt) 
dt 



^2 



(94) 



(95) 



^ log (1 + 2at + 4a2tit2) = ^ log (^1 + 2at + ^a^^ " ^ ) ^^^^ 



(97) 



(98) 



*For the purpose of analysis, the base of logarithm is not important, as long as all of them are consistent. Here, we choose 
natural logarithm to simplify the calculation. 
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To obtain stationary points, we solve 



/3i 



/3i + P2t 



+ log 



1 



/5i + /32t 



0, 



(99) 



which is equivalent to 



(100) 



where v{t) = log ^g^rp^- Note that /3i < 1. We will consider three different cases. The first case is 
< /?! < 1. By comparing the curves of 1 + v and fSie"" as functions of v, we see that there are two 
solutions with opposite signs, namely ?;i < and V2 > 0, to dlOOl ). Note that 



9{k) = - log{l + ak). 



51 2 



Meanwhile, v{t) is monotonically decreasing on [^,k], and 



v{k) = log 



1 



(1 + ayt)2 



<v{- 



log 



1 



1 + aA; 



< 0. 



Therefore, it is evident that v{k) < vi < v (|) < V2. Further, it can be readily seen that 



dgit) 



dt 
dg{t) 



1 + vit) - Pie 



v{t) 



dt 



t=k 
k 



t2 



1 + v{t) - /3ie^W 



t2 



> 



< 0. 



t=k 



In summary, g{t) is increasing at t = | and decreasing at t = A;, it takes the same value at these two 
points, and there exists only one stationary point in between. These observations lead to the conclusion 
that min^^4g[;-]\[|] f{t) = f{k) = I log(l + ak). 

To analyze the cases for /3i = and /3i < 0, we only need to note that there is only one solution vi 
to (1 100b . Thus, similar argument applies to these two cases. 
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